Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
نویسندگان
چکیده
A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.
منابع مشابه
A New Vision-Based and GPS-Signal-Independent Approach in Jamming Detection and UAV Absolute Positioning Assessment
The Unmanned Aerial Vehicles (UAV) positioning in the outdoor environment is usually done by the Global Positioning System (GPS). Due to the low power of the GPS signal at the earth surface, its performance disrupted in the contaminated environments with the jamming attacks. The UAV positioning and its accuracy using GPS will be degraded in the jamming attacks. A positioning error about tens of...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA Game Theoretic Approach for Greening, Pricing, And Advertising Policies in A Green Supply Chain
In this paper, greening, pricing, and advertising policies in a supply chain will be examined with government intervention. The supply chain has two members. First, a manufacturer seeking to determine the wholesale price and the greening level and second, a retailer that has to determine the advertising cost and the retail price. The government is trying to encourage the manufacturer to green t...
متن کاملEvaluation Approaches of Value at Risk for Tehran Stock Exchange
The purpose of this study is estimation of daily Value at Risk (VaR) for total index of Tehran Stock Exchange using parametric, nonparametric and semi-parametric approaches. Conditional and unconditional coverage backtesting are used for evaluating the accuracy of calculated VaR and also to compare the performance of mentioned approaches. In most cases, based on backtesting statistics Results, ...
متن کاملHilbert Space Embeddings of POMDPs
A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes’ rule to these distribution embeddings. Policies and value functions are defin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002